---
title: PXF Developer Concepts
---
The PXF SDK provides the classes and interfaces that you use to implement support for new external data sources, data formats, and data access APIs in Greenplum Database. You can also use the PXF SDK to extend existing external data sources, formats, and APIs.

When you develop with the PXF SDK, you ultimately build a JAR file. The Greenplum Database administrator deploys your JAR file to the Greenplum Database cluster. The Greenplum Database end user accesses your external data source/format/API when invoking a `CREATE EXTERNAL TABLE` command specifying the `pxf` protocol.

This topic introduces concepts central to developing with the PXF SDK. These concepts will be discussed in detail in later sections of this guide.

## <a id="architecture"></a>Plug-ins, Connectors, and Profiles

The PXF API includes the *Fragmenter* class and read and write *Accessor* and *Resolver* interfaces. You implement theses classes and interfaces when you extend PXF to add support for a new external data source, data format, or data access API. The classes that you create are called *plug-in*s.

A connector is a set of plug-in classes that support read and/or write operation(s) to an external data source:

- A set of a single *Fragmenter*, *Accessor*, and *Resolver* plug-in class together comprise a *read connector*. 
- An *Accessor* and *Resolver* plug-in pair comprise a *write connector*. 
- A single *Accessor* or *Resolver* class may implement both read and write operations.

A *profile* is a simple name mapping to a set of connector plug-in class names. You or the Greenplum Database administrator may choose to configure one or more profiles for your connector as a convenience for the Greenplum Database end user.

Summary of terms:

| Term       | Description                                |
|------------------|--------------------------------------------|
| Plug-in       | A *Fragmenter*, *Accessor*, or *Resolver* class implementation.    |
| Connector       | A set of plug-in classes that support the read and/or write operation to an external data source.    |
| Profile       | A name mapping to a set of connector plug-in class names.   |


## <a id="dataflow"></a>Data Flow

PXF in Greenplum Database has two components:

- A C shared library that is loaded into Greenplum Database when the `CREATE EXTENSION pxf` command is invoked on a database.
- A Java service, referred to as the PXF agent, a single JVM process located on each Greenplum Database segment host. You start the PXF agent when you run `pxf cluster start`.

Operations on Greenplum Database external tables are first routed to the PXF C shared library extension then on to the PXF agent.

The PXF C library validates PXF-specific parameters in the `CREATE EXTERNAL TABLE` command. The PXF agent invokes a connector's plug-in classes only after the end user performs a `SELECT` (read) or `INSERT` (write) operation on the external table.

The PXF agent initiates a read operation on the external data source when the user runs a `SELECT` command on a PXF external table. The PXF agent spawns a thread that invokes the connector *Fragmenter*, which splits data from an external data source into a list of fragments that can be read in parallel. A read *Accessor* reads a single fragment from an external data source and produces a list of records/rows. The read *Resolver* deserializes a record/row into fields. Finally, PXF translates these fields into Greenplum Database table column values. 

<img src="../graphics/pxfreadwrite.png" class="image" />

The PXF agent initiates a write operation to the external data source when the user runs an `INSERT` or similar command on a PXF writable external table. When writing to an external data source, PXF translates Greenplum Database table column values into fields and invokes the connector write *Resolver*. The write *Resolver* serializes these fields into a record. The write *Accessor* writes a record directly to the external data source.

